Application transmittal 



UTILITY 
JPATENT APPLICATION 
a TRANSMITTAL 



new non-provisional applications under 37 CFR 1 53(b)) 



j}!ssistant Commissioner for Patents 
i >^ox Patent Application 
lavashington D.C. 20231 



Attorney Docket No/Golestani 3 



First Namedlnventor or Application Identifier S. Jamaloddin Golestani 



Title End-to-End Internet Congestion Control 



Express Mail Label no. EE046546821 US 



- 

ACCOMPANYING APPLIC/S^-^OW 



APPLiCATION ELEMENTS 



IX] Fee Transmittal Form (original and duplicate) 

IXI Specification Total Pages 28 

title 

cross reference to related applications (e.g. provisional application) 

background 

sunnnnary 

brief description of the drawings (if filed) 

detailed descnption 

claims 

abstract 

^ Drawing(s) Total Pages 2 

^ Declaration Total Pages 2 

^- im Newly executed 
^J; □ Copy from a prior application (37 CFR 1 .63(d)) 

(for continuations/divisionals with section below filled out) 

& n Deletion of Inventor(s) Signed Statement attached deleting 

ry inventor(s) named in the prior application. 37 CFR 163 (d)(2) 

and 1.33(b). 

|£p Incorporation by reference (usable if Declaration is a copy): 
"^i The entire disclosure of the prior application, from which a copy of the oath or declaration 
i~= ^ is supplied, is considered as being part of the disclosure of the acconnpanying application 
' is hereby incorporated by reference herein 

.n Other 



CD" 



U 



1X1 Assignment 

[XI Recordation form 

1X1 Power of Attorney 

13 Postcard 

I I Small entity statement 

I I Certified copy of priority documents 

I I Information disclosure statement 

I I Copies of IDS citations 

□ 37 CFR 3.73(b) Statement 
I I check 

□ Other 



IS CONTINUING APPLICATION, check appropriate box and supply the requisite information: 

Continuation Q Divisional Q Continuation-in-part (CIP) of prior Application No: 



CORRESPONDENCE ADDRESS 



'^f~~\ Customer Number or Bar Code Label 



(insert Customer No. or Attach bar code label here) 



Correspondence Address below 



NAME 



Henry T. Brendzel 



ADDRESS P.O. Box 574, Springfield, NJ 07081 



COUNTRY United States 



FAX (973)467-6589 



SIGNATURE OF APPLICANT ATTORNEY, OR AGENT 



Name. 



Henry T. Brendzel 



Reg. No. 26,844 



Telephone 



(973) 467-2025 





Signature 



Date 



I hereby certify that this Applic§fi6n is being deposited with the United States Postal Service "Express Mail Post Office to Addressee" service under 
37 CFR 1.10 on the date indicated above and is addressed to the Assistant Commissioner for Pat^s^ Washington D.C. 20231. 



Henry Brendzel 



Date of Deposit 



(Printed Name of Person Mailing Paper) 




jgnature of Persgr^ailing 



^ailingPapeil^ 



End-to-End Internet Congestion Control 

Background of the Invention 

This invention relates to congestion control in networks and, more 
5 particularly, to congestion control in the Internet. 

Congestion control in packet networks has proven to be a difficult problem, 
generally. In the Internet, however, this problem is particularly challenging, due to 
the very limited observability and controllability of the network. In order to 
accommodate rapid growth and proliferation, the design of the IP protocol and the 
10 requirements placed on individual subnetworks have been kept to a minimum. 
Consequently, the main fomn of congestion control possible in the current Internet 
is end-to-end control of user traffic at the transport layer. As exemplified by TCP, 
this control must be exerted using only the limited network observation that 
sessions can make locally, based on their own performance. The prevalent form 
15 of service discipline in the Internet is FIFO queueing, and control approaches that 
are based on more sophisticated service disciplines are not easily applicable. 

Although the current TCP congestion control has been relatively 
successful, its ability to optimally control congestion is exceedingly stretched by 
the rapid growth of the Internet and the proliferation of both real-time and multicast 
20 services. Over the past several years, considerable effort has been directed at 
improving the existing techniques of congestion control in the Internet and at 
introducing new approaches to accommodate the requirements of new services 
and applications. For example, methods of enforcing fairness or user priorities 
have been extensively studied in recent years. They are usually based on non- 
25 FIFO service scheduling at network switches, where traffic streams meet and 
competition for resources arises. Also, methods for network congestion control 
based on optimization techniques have been studied which use distributed 
computations. However, algorithms proposed for this purpose require 
sophisticated network layer protocols; which is a luxury that is not available in the 
30 Internet for end-to-end congestion control. 
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What is needed is a method for optimizing network usage without resort to 
sophisticated network layer protocols. 

Summary 

5 The problems found in the prior art are overcome, and an advance is 

achieved by treating the end-to-end congestion control as a global optimization 
problem. A class of minimum cost flow control (MCFC) algorithms for adjusting 
session rates or window sizes is accordingly disclosed where congestion control is 
achieved through consideration of cost function that addresses link congestion, 

10 and cost function that addresses the cost of providing less than the desired 
transmission rate. Significantly, these algorithms can be implemented at the 
transport layer of an IP network and can provide certain fairness properties and 
user priority options without requiring non-FIFO switches. 

A coarse version of the algorithm is geared towards implementation in the 

15 current Internet, relying on the end-to-end packet loss observations as indication 
of congestion. A more complete version anticipates an Internet where sessions 
can solicit explicit congestion information through a concise probing mechanism. 

Brief Description of the Drawing 

20 FIG. 1 shows a packet network with a plurality of switching or routing nodes 

with links that interconnect the nodes, and a number of sessions that utilize the 
network; 

FIG. 2 illustrates a session cost function; 
FIG. 3 illustrates a link congestion cost function; 
25 FIG. 4 presents a flow chart of the minimum cost flow control algorithm 

Detailed Description 

The dynamics of a network congestion control strategy can span multiple 
time scales. On the fastest time scale, congestion control provides protection 
30 against sudden surges of traffic by quick reaction to buffer overloads. The 
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reaction time in this type of control is, at best, on the order of one round-trip delay, 
since that is how fast news of congestion can reach a source node and the 
response to it propagate back to the trouble spot. On a slower time scale, 
congestion control could mean gradual but steadier reaction to the build-up of 

5 congestion, as perceived over a period involving tens, or hundreds, of round-trip 
times. It is on this time scale that notions such as the average transmission rate 
of a session, rate allocation, and fairness become meaningful. This disclosure 
addresses itself to this quasi-static congestion control, where the control time 
scale is the "medium-term" tens, or hundreds, of round trip times. 

10 A window scheme for end-to-end congestion control employs an 

arrangement whereby the amount of outstanding data for a given session is 
limited to a maximum number of packets. This number is referred to as the 
window size. In such an arrangement, a transmitting device feels free to keep 
sending packets, as long as the number of outstanding packets is less than the 

15 window size. Outstanding packets are packets that were sent to a destination, for 
which an acknowledgement was not yet received and no information is available 
to indicate that the packets were lost. When the number of outstanding packets 
reaches the maximum, transmission of packets is halted. 

Any particular session can, of course, control its window size and can, 

20 therefore, change its window size in response to changing network conditions. 
Thus, when it is determined by a party in control of a session that, for a given 
window size, no congestion occurs for the session (i.e., no packets are lost), the 
party might increase the window size and, thereby, effectively increase the rate of 
transmission. In TCP protocol, this dynamic control of window size is undertaken 

25 in a conservative manner. That is, for each packet that is transmitted successfully 
without a loss, the window size is increased only slightly. Conversely, for each 
loss of a packet, the window size is reduced significantly (e.g. to half its value). In 
this manner, the window size keeps changing, in a saw-tooth fashion, while 
adjusting itself to the capacity of the network. 
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It should be noted that for any fixed window size, as the network becomes 
congested and the round trip delay increases, the transmission rate is 
concomitantly reduced. This reaction takes place within one round trip delay; i.e. 
it iss a short-term operation. Thus, the window scheme provides a form of 
5 dynamic congestion control even if the window size is not adjusted according to 
network conditions. If modifying the window size in response to quasi-static 
network conditions is permitted, then the window scheme combines dynamic and 
quasi-static congestion control. In such an arrangement, the window size can be 
set to the product of the medium-term average rate, and the medium term 

10 average round trip delay, r/, i.e., w^^^r^^r^. 

The congestion control method disclosed herein performs global 
optimization in the network. That is, while the method contemplates that each 
session would control its own transmission parameters, the optimization is global, 
over all sessions that are active in the network. The disclosed method also 

15 contemplates no exchange of information between the sessions, and no central 
network measurement or control. As explained in more detail below, the global 
optimization is realized by distributed participation, by each session undertaking to 
execute an iterative algorithm. At least part of the method is performed by the 
receiving end apparatus of each session. Information about the recommended 

20 window size, or rate of transmission, is then communicated to the transmitting end 
apparatus of the session through a feedback path that is part of another session 
(from the receiving end apparatus serving as the transmitting end apparatus of 
this other session). More specifically, the receiving end develops a 
recommendation of the optimum window size or transmission rate and transmits 

25 that to the transmitting end. Alternatively, the receiving end develops information 
that it transmits to the transmitting end, and the recommended window size or 
transmission rate is developed locally at the transmitting end from that information. 

The above discussion about window sizes might lead one to believe that 
the global optimization method disclosed herein is suitable for window-size 

30 optimization. That is correct, but actually, the method disclosed herein is suitable 
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for both window size optimization and average transmission rate optimization. In 
the following discussion, optimization of transmission rate is presented, but it 
should be realized and understood that, based on the aforementioned relationship 
between window size and transmission rate, window size optimization is easily 
5 derivable. 

In the equations that follow, the communication links are denoted by index 
/ = 1, • • • , i , and the network sessions are denoted by index ^ = 1, - , 5* . A session 
corresponds to a one way flow of traffic between a source and a destination. The 
return traffic, which is effectively the feedback to the source, constitutes another 

10 session. The medium-term average packet transmission rate is denoted by r^, the 
medium-term average rate of traffic through link / is denoted by f\ and the 
vectors r and f represent r = {r^,r^,'^^rs) and f = (f\f\-*f^) , respectively. 
This is illustrated in FIG. 1 with links / I through /15 and sessions s1 through s4. 
In accordance with the principles of this disclosure, the cost function to be 

15 minimized is constructed from the point of view of the hypothetical network 

services provider. The hypothetical network provider realizes that there is a cost 
when the network fails to allocate bandwidth to users who are willing to pay. 
Therefore, for each session s, a convex cost function , is created that is a 
function of r^. More particularly, the created cost function, , is a decreasing 

20 function of the rate r,, as exemplified by the curve of FIG. 2. What the curve of 
FIG. 2 effectively states is that as the offered, or available, average transmission 
rate, r,, is decreased, the cost, in terms of user dissatisfaction or actual revenue 
lost, increases. The hypothetical network provider also realizes that there is a 
cost when bandwidth is allocated to a session but the session is unable to take 

25 advantage of the allocated bandwidth because of network congestion. Therefore, 
for each communication link / of the network, a convex cost function is created 

that is a function of . This function increases with increased , as exemplified 
by the curve of FIG. 3. What the curve of FIG. 3 states is that as the flow 
approaches the capacity of the link, C', the average queue length of messages 
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waiting to traverse the link increases, and the danger of congestion obviously 
goes up. 

A packet network can employ two types of routing: single path, and 
multipath. In single-path routing (such as the routing in the current Internet), the 
5 entirety of the traffic takes a given path. In multipath routing, a session's traffic 
may have different portions routed over different paths that lead to a given 
destination. Of course, the routing tables could be updated over time for both 
types of routing. 

Considering first the more general, multipath, case, if (p[ is the fraction of 
10 traffic of s that is carried over link /, the flow in a given link is given by 

(1) 

which simply says that the flow of each link is the sum of the fractions of flows of 
all sessions carried over it. It is assumed in equation (1) that the number of 
packets lost at link / is negligible, and it is also assumed that the time scale of 
15 routing updates is relatively long compared to the medium-term averaging time of 
the congestion control algorithm. 

In accordance with this disclosure, network congestion control is based on 
the following global optimization relationship: 

minJ(r)-|;e,(0 + X^/(/)' (2) 

20 subject to the condition that the rate allocated to each session is not less than 
zero and not more than the rate desired by each session. Note that since the 
session and the list cost functions are convex, e/'> 0 and g,'\f') > 0 . 

While equation (2) provides an expression for a global cost function, it has 
already been stated that it is desired to have each session control its own rate. To 

25 that end, an incremental reward function, h^rj , is defined for session s by 

K(fs)^-<(.fs)> s = l,2,--,S, (3) 



6 



Golestani 3 

where is the average transmission rate that is actually achieved by session s, 
and is the derivative of the cost function, ; and a congestion measure 
function, ) > is defined for session s by 

rsii) - ~tsi(f) = t^s^8/if') . (4) 

^^s /=1 /=1 

5 where f = (f\f\'^-^f^) is the link flow vector corresponding to . Equation (3) 
provides a measure of the sensitivity of the cost function of the sessions to 
changes in the transmission rate. Equation (4) provides a measure of the 
sensitivity of the total cost of congestion to changes in the flow of traffic through 
the links that session s is employing. With these formulations, when cost 

10 functions andg^ are such that e;(rj<0 and g/(/')>0, it can be shown that 
necessary and sufficient conditions to minimize equation (2) are: 
Mr:)<rs(n if r:=0 

K(r:) = rs(n if 0<r;<r/ s=lX-'.S, (5) 

^.(^;)^/.(f*) if r:>rf 
where is the optimized rate, and f * is the flow vector when the sessions that 
contribute to the flow are at their optimized rate. Of course, for single-path 
15 routing, equation (4) reduces to 

r.(f)-Zg/'(/), (6) 

where denoted the path of session s. Stated in simpler terms, the sum in 
equation (6) is taken over those links that carry the traffic of session s. 

Interpretation of the equation (5) optimality condition is straight forward: at 
20 the optimal transmission rate, r* , as long as the rate is not at the 0 and r/ 
bounds, the session's incremental reward function is equal to the incremental 
measure of congestion. If cannot be decreased (increased), then the session's 
incremental reward function may be smaller (larger) than the session's congestion 
measure. 
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Since the conditions in the network are not stationary, and there is no 
knowledge of the behavior of other sessions, the optimization problem of equation 
(2) cannot be solved in closed form even when the cost functions are expressed in 
closed form. However, the constrained optimization problem of equation (2) can 
5 be solved by means of a gradient projection algorithm where, with each iteration, 
from the current rate, r,, we first derive an auxiliary parameter, 

/; = r^ + //-(/z^(rJ-;K^(f)), where ju is a multiplicative step size coefficient. Then 
we update by: 

r ^^r"^ If r^<r , 
where rf'' > 0 . 

10 When this algorithm is carried out by all of the sessions, it converges to the 

optimal point of equation (2), provided that the step size // is chosen to be small 
enough. I call this algorithm the minimum cost flow control (MCFC) algorithm. 
Distributed execution of the iterations represented by equation (7) by various 
sessions in the network is possible if, prior to each iteration, the current values of 

15 congestion measures, are available. 

A priori knowledge of the desired session rates r/ is not actually necessary 
for the execution of the MCFC algorithm. When updating session rates, the upper 
bound of can be simply disregarded, letting the course of action determine 
whether or not a session s is allocated its desired rate. In other words, the 

20 iteration represented by equation (7) can be replaced with the following equation, 
<- max (O, K + juiMZ) ' 7 s(f))) > (8) 
where is the average rate that is actually utilized by session s during the past 
iteration, as compared to the allocated rate, . For sake of simplicity, the 
distinction between the allocated rate and the rate that is actually utilized is 

25 ignored in the equations that follow, leading to the equation 
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^ max (0, + ju{hXr,) - r ,(f ))) . (9) 
This simplification is equivalent to assuming that sessions are always greedy, i.e., 
that they utilize whatever rate is allocated to them. 

The speed of convergence of the MCFC algorithm can be significantly 
5 improved by incorporating the second derivatives of the cost function in the 
evaluation performed in block 14; i.e., the replacement schema is: 

n <- max (0, r+ju ^'^^'^ ~ ^'^^^ ) (10) 

where 

^s(f)^^ts'(f')=t(<py ■§:(/')■ (11) 

10 In single-path routing, equation (11) reduces to 

Uf) = (12) 

The precise fonn that functions and (and, consequently, \ and 
take on is not necessarily critical (as long as the above-mentioned conditions are 
maintained), but it is useful to have a better appreciation for the effects of those 
15 functions. To that end, an incremental reward function of the form 



h = 



(13) 



is considered for some positive values of and v^. When at the optimum rate 
the medium-term average transmission rate is r* and r, = K' follows that 

« or. 



r, — 



(14) 



20 and taking the derivative of equation (13) with respect to r* and rearranging terms 
yields 

dr^ 1 dr. 



(15) 
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A few observations can be made in connection with the equation (13) function. 
First, it may be noted that the allocated rate is proportional to a^. Therefore, a 
session with a large amount of traffic may be accommodated by assigning to it a 
large a^. Next, it may be noted (from equation (14)) that as congestion builds up 

5 in the network and y ^ increases, the allocated session rate decreases and the 
change is inversely proportional to The measure of increase and decrease 
is sensitive to the value of v^. That means that two sessions that are equivalent 
in all other respects (and both use the incremental reward function of equation 
(13)), will cause their transmission rate to change differently if they are directed to 

10 use different values of . 

Realization of this fact suggests that can be used as a priority 
assignment to sessions. Sessions with larger are cut less severely in response 
to network congestion. Correspondingly, a larger makes sessions less 
sensitive to the number of hops they must traverse in the network. It should be 
15 mentioned, perhaps that any advantage gotten from setting at some level is 
only relative. If all sessions are assigned a large v^, the congestion measures 
will increase until every body is cut back to the proper usage level. 

Another form for the incremental reward function, which may be quite 
useful for the current Internet realization, is 

20 K-K^~^. (16) 

for some positive value of u,. Here, too, can be used to effectively control 
priority, as long as r/^ » 7^ . 

L 

One way to look at the reason for including the ^g^/^ component in 

/=i 

equation (2) is that inclusion of this term inhibits the algorithm from driving the 
25 network links into congestion by accepting too much traffic from the sessions. 
This, obviously, imposes at least one condition on the cost function g,(/') and on 
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its derivative, . Specifically, if the desired cap on the probability of loss in a 

link / , A^ is set to , then the derivative of the cost function should be such that 

= ^ for From this, it follows that <1^, , since the cost of 

reaching is infinite. 
5 To illustrate, one such function may be of the form 

for some positive-valued v , k% v is decreased, becomes steeper, which 

on the one hand, increases the link utilization at the optimal point but, on the other 
hand, reduces the speed of convergence. 

10 The incremental congestion cost of a link is specified in equation (17) as an 

explicit function of the link flow. Therefore, in the actual running of the algorithm, 
the link flow must be measured, in order for the function g/(/') to be evaluated. 
Alternatively, it is possible to use the average queue length of a link as the 
measurement parameter based on which the incremental congestion cost is 

15 specified. Thus, if ij denotes the average queue length of link /, and r][ denotes 
the average queue length corresponding to flow fl in link /, then g/(/0 might 
advantageously be specified by: 

The congestion avoidance property discussed above, which arises from 
20 setting - for f = // , hinges on the ability to specify the threshold 

parameters /j in equation (17), or tjI in equation (18), based on the desired loss 
probability cap A!^. Obviously, the relationship between these parameters 
depends on the statistics of the traffic passing through the link, which is not easily 
predictable. Therefore, the threshold parameter of choice, i.e., or r/l, must be 
25 specified in anticipation of likely changes in traffic statistics, such as burstiness. A 
main distinction between defining the incremental congestion cost directly in terms 
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of /' , or implicitly in terms of rj , is in the sensitivity of the corresponding 
threshold parameter to the traffic statistics. Intuitively, it seems that t]\ should be 
less sensitive than /J to changes in traffic statistics, suggesting that the 
incremental congestion cost should be specified in terms of the average queue 
5 length. 

Returning to the iterative optimization method of the MCFC algorithm, the 
main difficulty facing the realization of the equation (9) MCFC algorithm is the 
distributed computation of the congestion measures In a network with a highly 
developed network layer, the task of computing congestion measures and 

10 distributing them to the corresponding sessions (or access points) can be 

performed by a specially designed network layer protocol, in possible cooperation 
with the routing protocol. In the Internet or other IP networks, realization of the 
MCFC algorithm is more challenging, since it needs to be carried out without 
explicit knowledge of the network's routing parameters and without cooperation 

15 from an IP layer. 

The following discloses two realizations for the MCFC algorithm at the 
transport layer of an IP network: an exact realization requiring modest cooperation 
by network switches, and a coarse realization with no such requirement. The 
latter is directly applicable to the current realization of the Internet, whereas the 

20 former requires a modest enhancement to the Internet. 
Exact Realization 

Distributed execution of the MCFC algorithm by diverse, independent, 
sessions is possible if the sessions have a way of evaluating the corresponding 
congestion measures. There are two basic requirements for the evaluation of 
25 congestion measures, r,,bya session s. First, at each link / there must be a 

local capability to evaluate the incremental congestion cost ^/(/^) on an ongoing 
basis. Second, there must be a way of communicating this information to the 
sessions traversing link /. This is achieved by modifying the switches (or routers. 
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or any other multiplexing points) in the Internet network to include the following 
capabilities: 

• Each switch in the network has the capability of estimating for each 
link originating from it. This estimation is performed on an on-going basis. 

5 • Some of the data packets traversing the network are marked by the source (or 
the access point) as probe packets. Each probe packet carries user data, and 
also includes a short congestion field to carry congestion information. A probe 
packet begins its journey with this field set to zero. 

• Each switch in the network, before fonA/arding a received probe packet over an 
10 outgoing link /, increments the packet's congestion field by the current 

estimate of the link's incremental cost ^/(/^) . 

A "switch" in the context of this disclosure, can be a router, a multiplexer, or 
the like. 

In this manner, as a probe packet traverses the network on its way to its 
15 destination, the congestion field continues to be incremented and thereby 
constructs a measure of equation (6). It can be shown that for multiple-path 
routing arrangements, the expected value in the congestion field of the probe 
packet, upon arrival at the destination, is y^. For single-path routing 
arrangements, the value in the congestion field of the probe packet, upon arrival 
20 at the destination, actually corresponds to y^. Thus, in single-path routing 

arrangement, the value of a session's congestion measure at any given time can 
be obtained from a single probe packet. Also, for single-path routing, can be 
determined based on an identical approach; it suffices to designate a new field in 
each probe packet for the second derivative information and have this field be 
25 incremented by each visited switch in a similar fashion. 

Ideally, one would like to see the network traffic remain stationary until the 
algorithm converges to its optimal point. In real network operation, however, due 
to quasi-static traffic changes, the optimal point is not stationary and may be 
viewed as a moving target that the algorithm tries to reach. Although this target 
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may not be reached exactly, with a sufficient speed of convergence, the algorithm 
should be able to keep up with the pace of network changes and follow the 
optimal point relatively closely. Since the network traffic is an aggregation of 
traffic from many sources, its changes are typically slower than the dynamics of 

5 individual sessions. 

In general, a distributed algorithm may be executed either synchronously, 
or asynchronously. In a loosely connected network such as the Internet, 
synchronous execution of by various sessions is not feasible. Moreover, the 
potential benefit of synchronous execution in terms of providing faster 

10 convergence is either minimized or totally removed by the quasi-static traffic 
variations. 

In an asynchronous implementation, each session updates its input rate 
without timing coordination with other sessions. To increase the speed of 
convergence, the session congestion measures should be updated regularly, 

15 based on regular transmission of probe packets. Similarly, each link should 

update its incremental congestion cost on a regular basis. Evaluation of session 
congestion measures and link incremental costs should involve a limited memory 
span, so that the information regarding past network status is slowly forgotten and 
replaced by the more recent network conditions. This goal may be accomplished 

20 by updating session congestion measures and link average queue lengths by 
using, for example, the following exponentially weighted running averages: 

rs^(}-Ps)rs^Ps'ff^ (19) 

and 

r( ^{\-l^)ri-^j5'^ri\, (20) 
25 where y^p is the congestion field of the received probe packet, and ij, is the 
queue length at the time t The update of the average queue length of equation 
(20) is based on the presumption that the incremental link cost functions, . 
are expressed as a function of the average queue lengths, ij . If, instead. 
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functions are expressed as a function of f (e.g., equation (17)), then we 

sliould update the link flows /' rather than queue lengths 77^ 

The choice of the repetition rate at which the updates in and are 
made involves a trade-off between accurately measuring traffic conditions in the 
5 network and quickly responding to it. Conceptually, it seems desirable to apply 
the same repetition rate to the evaluation of link incremental costs, throughout the 
network. However, due to the wide range of link and session transmission rates in 
a diverse network such as the Internet, it may prove inevitable that different 
switches would be updating their g^if) estimates at different rates and the 
10 different session would update their and at different frequencies. 

Once a session's congestion measure is evaluated, the session's rate can 
be updated through 

^ max (r^ , Z + MiMK) " Ys)) . (21 ) 

where r^"'' is a small rate initially allocated to each new session s to enable 

15 transmission of probe packets needed for the initial evaluation of congestion 
measure, and is the actual utilized or attained rate. It may be noted that a 
session need not execute the evaluations of equations (19) and (21) with the 
same frequency. The congestion measure is updated each time a new probe 
packet is received, while the rate may be updated at the same time, or less 

20 frequently. 

An alternative to explicitly updating the congestion measure through 
equation (19) and using it for rate updates, is to update the rate directly based on 
the congestion field of the received probe packets p: 

^ max (r^ , ^ + s{hXZ) -y\'^ )) . (22) 
25 One can easily verify that the statistical average of the rate change in equation 
(22) is identical to the rate change according to equation (21), provided that the 
right step size s is used. Although in this approach the congestion measure is 
not explicitly determined, updating the rate through equation (22) amounts to 
maintaining an implicit estimation of the congestion measure. 
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One realization of an algoritlinri comporting witii the principles disclosed 
herein is presented in FIG. 4. Therein, In block 10 the receiving end in the 
session selects an Initial rate, r'"" , and that rate is set as the maximum allowable 
rate for the session, . Control passes to block 1 1 , where packets are 
5 transmitted, subject to this allowable rater, . In the course of the transmission and 
reception of packets, block 12 evaluates the congestion measure /sif) . and 
block 13 evaluates the attained rate . Control then passes to block 14 where 
the rate is updated per equation (21) or (22), returning control to block 1 1 . 

As stated earlier, when the session is always greedy and utilizes whatever 
10 rate Is allocated to it, equation (21) or (22) may be simplified by replacing on 
the right hand side with . With this simplification, block 13 in FIG. 4 can be 
eliminated. 

It may also observed that some Information is made available at the 
receiver end, and that some Information must be communicated to the transmitter 

15 end. Which steps of the algorithm described in FIG. 4 are taken at the receiver 
end is not a critical point. Illustratively, the receiver can obtain session congestion 
information, send that information to the transmitter, and have the transmitter end 
do the rest. On the other extreme, the receiver can evaluate the new , and 
communicate that value to the transmitter. Obviously, each approach has 

20 different implications on the design of transport protocols, the control Information 
that must be exchanged between the source and receiver, and the interaction 
between error control and congestion control. 
Coarse Realization 

In the absence of explicit congestion notification, the only observation a 
25 session can have about the network is through its own performance, i.e., the loss 
and delay of its own packets. What Is needed is to select a function for g'(f') 
such that the resulting congestion measure (equation (6)), can be estimated 
through the available loss and delay infomnation. 
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Denoting the end-to-end loss probability and the average delay of packets 
of session s, by , and , respectively, and the average delay of each link / by 
Di , it can be shown that the delay through the path taken by packets of session s 
can be expressed by: 

5 DXf)-t<p['D\f)^ (23) 

and that the losses in the path taken by packets of session s can be expressed 
by: 

A,if)^j]p[.^(f'), (24) 

where the approximation of equation (24) is valid as long as ^,«\. Employing 
10 equations (23) and (24) together with defined by 

+ 1 = 1.2 -'.L^ (25) 

converts equation (4) to: 

r,(f)-^'A(f)+^(f)- (26) 
In accordance with well known prior art techniques, a session can estimate 
15 the average delay and loss probability associated with its own transmissions and, 
therefore, equation (26) offers a means for estimating from the estimates of the 
average delay and loss probability. The cost function specified in equation (25) 
meets the convexity requirement, since D^(f^) and J^(f^) are both increasing 
functions of /. 

20 While there clearly is a positive correlation between the average delay and 

the level of congestion on a link, average delay, in and of itself, is not indicative of 
congestion. Other information, such as the propagation delay and the available 
buffer space (or the acceptable range of queueing delays) is essential to infer the 
level of congestion associated with a given average delay. In contrast, the loss 

25 probability provides a more conclusive indication of the severity of congestion. 
This suggests that the cost function of equation (25) can be modified to merely 
consider probability of packet loss; i.e., modified to 
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which converts equation (26) to: 

r.(f) = ^.(f)- 



(27) 



(28) 



Noting the caveat expressed in connection with equation (24), if a large fraction of 
5 losses is due to transmission error, as could be the case in wireless 

communications, link loss probability cannot be trusted as a good indicator of 
congestion. 

The strong congestion avoidance property that came about, when the cost 
function was defined earlier in terms of g/(/') = oo for f = // , does not apply 

10 with link cost functions chosen in accordance with equation (27). In fact, it is easy 
to see that if link cost functions of equation (27) are used in conjunction with 
unbounded session reward functions such as in equation (13), the MCFC 
algorithm could drive the network into heavy congestion. If, on the other hand, the 
reward functions are appropriately bounded, small loss probabilities can still be 

15 guaranteed at the optimal point of the algorithm. One such incremental reward 
function is disclosed above in equation (16). In such an arrangement, may be 
estimated using an exponentially weighted running average algorithm, whereby 



rs'^{^-Ps)rs'^Ps packet loss. 
In comparison to the exact realization and equation (19), this is analogous 
20 to viewing every packet, p, as a probe packet with the hypothetical congestion 
field y\^^ , with is equal to 1 if p is lost and equal to 0 otherwise. 

The algorithmic similarities between estimating in the coarse and exact 
realizations should not obscure a fundamental difference between the two cases 
25 regarding the range of statistical fluctuations in y\^^ and the accuracy of 

estimations. For instance, in the exact realization in a network with single-path 
routing, one probe packet is enough to determine the congestion measure. In the 
coarse realization, on the other hand, the analogous parameter, y[^^ , associated 



successful transmission 



(29) 
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with each packet p, is either one or zero, with an average typically in the order of 
few percent or less. Due to the random nature of r[^^ , a much larger number of 
observations is necessary before the algorithm of equation (29) converges to a 
reasonable estimation of the end-to-end loss probability. As a numerical example, 

5 if = 0.01 , typically one out of every 100 packets is lost, implying that at least 
several hundred observations are needed for a meaningful estimation of A.^ . This 
sharp difference from the exact realization is the result of restricting information 
about network status to the packet losses that are locally observed. 

One way to run the coarse MCFC algorithm is to update the rate via 

10 equation (21), based on explicit estimation of obtained in equation (29). An 
alternative approach, like in the exact realization, is to directly update the rate, 
upon observing each new loss or successful transmission, by way of equation 
(22). In the coarse realization, due to the wide random fluctuations of y^^^ , 
equation (22) effectively constitutes a stochastic process. A small s , prolongs the 

15 time necessary for the rate of new sessions to reach the final value. A large e , on 
the other hand, gives rise to large oscillations in the session rates, induced by the 
random fluctuations of y^^"^ . This difficulty can be overcome by adopting a 
variable step size in equation (22), i.e. adjusting £• as a function of iteration 
number, session rate, or some other parameter. 

20 For the coarse realization, equation (22) can be restated as: 

r ^ r + a (r ) successful transmission 

(30) 

^ max(rf , - b^r,)) packet loss, 

where 

= ^.(0-^.(0 > (31) 

and 

25 bXr,)-eXrXl~K{rJ). (32) 

The term 8^ in the above equations is denoted as a function of in order 
to emphasize the possibility of changing the step size during the course of the 
algorithm, based on the value attained by (or some other criteria). According to 



19 



Golestani 3 

equation (30), a session's rate must be increased by a^rj each time a packet is 
successfully transmitted, and reduced by bXO each time a packet loss is 
observed. 
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Claims: 

Line network that carries traffic of a plurality of sessions, a method, 
carried out by one of said sessions, comprising the steps of: 

evaluating a session congestion measure that is related to congestion 
5 information on links of said network which carry incoming traffic to a receiving end 
of said session; 

evaluating a session incremental reward function that is related to rate of 
said incoming traffic; 

evaluating a new rate of said incoming traffic that moves said rate of said 
10 incoming traffic in a direction that minimizes a global network cost function which 
combines cost functions assigned to said sessions and congestion cost functions 
assigned to said links. 



2. The method of claim 1 where said session incremental reward function 
15 is the negative of a derivative, with respect to rate of said incoming traffic, of said 

one of said cost functions assigned to said session. 

3. The method of claim 1 where said session congestion measure is a 
derivative, with respect to said rate of said incoming traffic, of a sum of congestion 

20 cost functions assigned to links employed by said session. 

4. The method of claim 1 where said congestion cost function assigned to 
a link is very large for link loads in excess of a selected threshold, chosen as 
maximum permissible link load. 

25 

5. The method of claim 1 where said new rate is an incremental change 
from said rate of said incoming traffic of said session, where the incrementing is 
determined based on said session incremental reward function and said session 
congestion measure. 

30 
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6. The method of claim 1 where said step of evaluating a new rate is 
carried out at a receiving end of said session, and said method further comprises 
a step of communicating information to a sending end of said session, to change 
said rate of said incoming traffic towards said new rate. 

7. The method of claim 1 where said step of evaluating a new rate is 
carried out at a sending end of said session and includes a step of receiving at 
said sending end results of said step of evaluating said session congestion 
measure 

8- The method of claim 5 where said new rate developed is an incremental 
change arrived at through an additive factor. 



9. The method of claim 8 where said new rate, r,, is determined based on 
15 an auxiliary parameter, = + jU'{\{r,)-Y ,{f)) ^ where // is a multiplicative step 
size coefficient, is an assigned rate of incoming traffic at time of said evaluating, 
hXr,) is said session incremental reward function, and f ^(f) is said session 
congestion measure. 

20 10. The method of claim 9 where said new rate, r,, is determined by 

r^^i if rf^<f,<r/ 
r ^r'"'' if r <r'"'' 
r ^r-r"^ If r"^ <f 
where rf ' > 0 . 

11. The method of claim 8 where said new rate, r,, corresponds to the 
larger of rf' or ^ + //-(A^(^)-/^(f)) , where r^"'' is a given initial rate that is 
25 greater or equal to 0, is a multiplicative step size coefficient, is an attained 
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average rate of incoming traffic at time of said evaluating, \(r^) is said session 
incremental reward function, and /^(f) is said session congestion measure. 



12- The method of claim 1 where said session incremental reward function 
5 is a positive, decreasing, function with respect to session rate. 

13. The method of claim 1 where said session incremental reward function 
is a positive, decreasing, function with respect to session rate, starting at a 
minimum session rate, rf"" > 0 , where the incremental reward function is a very 

10 large value at = rf^ . 

14. The method of claim 1 where a derivative of each of said link cost 
functions is a positive, increasing function with respect to rate of traffic on the link. 

15 15. The method of claim 1 where a derivative of each of said link cost 

function is a positive, increasing, function of an average queue length in said link. 

16. The method of claim 1 where the derivative of the congestion cost 
function for a link of said network is defined by = \ — ~ — , 

20 where v \sb positive constant, f is the average traffic flow rate at link /, and 77' 
is the average queue length in link / . 

17. The method of claim 1 where said session incremental reward function, 



25 



, corresponds to 



where and are selected positive constants. 



18. The method of claim 17 where different ones of said plurality of 
sessions employ different values of . 
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19. The method of claim 17 where different ones of said plurality of 
sessions employ different values of to achieve different levels of priority. 

5 20. The method of claim 1 where said second incremental cost, h^,, 

corresponds to h^.^ — - — , where h^^ , n , and are selected constants for 
each of said sessions. 

21. The method of claim 1 where said incoming traffic comprises packets, 
10 and all packets of said incoming traffic of said session traverse the same path that 
includes a given subset of links of said network. 



22. The method of claim 21 where said new rate is incrementally changed 
from said rate of said incoming traffic of said session, where the incrementing is 
15 related to said session incremental reward function and said session congestion 

measure, yXi) , defined by /^(f) = 1 where is said subset of links, 

and g/(/0 is the derivative of said session congestion cost function of link / . 



23. The method of claim 1 where said incoming traffic comprises packets 
20 where some of said packets traverse one subset of links of said network, and at 
least some others of said packets traverse a different subset of links. 



24. The method of claim 23 where said new rate is incrementally changed 
from said rate of said incoming traffic of said session, where the incrementing is 
25 related to said session incremental reward function and said session congestion 

L 

measure, /^f) > defined by y X^)^YjV^s'Si{f) > where (p[ corresponds to a 
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fraction of said packets of said incoming traffic that flows through link /, and 
gi\f^) is the derivative of said congestion cost function of link /. 

25. The method of claim 1 where said incoming traffic originates at a 
5 sending end, and said sending end includes in said incoming traffic probe packets 
that include at least one congestion field that is modified by network nodes 
through which said probe packets traverse. 

26- The method of claim 25 where said probe packets are transmitted by 
10 said sending end at regular intervals. 

27. The method of claim 26 where said probe packets also carry 
information for said receiving end. 

15 28- The method of claim 25 where each of said nodes through which a 

probe packet traverses, updates a first one of said congestion fields based on a 
current estimate of the incremental cost, , of a link through which said 

probe packet leaves said node. 

20 29- The method of claim 25 where each of said nodes through which a 

probe packet traverses increments a first one of said congestion fields by a 
current estimate of the incremental cost, g/(/0 , of a link through which said 
probe packet arrives at said node. 

25 30. The method of claim 29 where each of said nodes through which a 

probe packet traverses modifies a second one of said congestion fields based on 
a current estimate of the second derivative g/"(/0 ^ of said session congestion 
function of a link through which said probe packet leaves at said node. 
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31. The method of claim 30 where information received at said receiving 
end of said session from said second one of said congestion fields is employed to 
control said rate of said incoming traffic. 

32- The method of claim 30 where said step of evaluating said session 
congestion measure employs information contained in said at least one 
congestion field of probe packets received in said incoming traffic and in said 
second one of said congestion fields. 

33. The method of claim 1 where said step of evaluating said session 
congestion measure replaces a current value of said session congestion measure, 
Y^, with a new value of said session congestion measure in accordance with 

^new ^ (^i_p>^y^ _^ ,yip) ^ vvhere is a selected constant that is less than 1 , and 
/^^^ is the value of said at least one congestion field of a received probe packet. 

34. The method of claim 1 where said step of evaluating said session 
congestion measure equates said session congestion measure to the value of 
said at least one congestion field of a received probe packet. 

35. The method of claim 1 where said step of evaluating said session 
congestion measure is based on probability of packet loss experienced at said 
receiving end. 

36. The method of claim 35 where said rate of said incoming traffic is 
controlled in accordance with 

when there are no packets lost, and in accordance with 

/;^max(r7^ r^-b^rj) 
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when there are packets lost, where 
sX^s) is a step size. 
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Abstract 

In a class of minimum cost flow control (MCFC) algorithms for adjusting 
session rates or window sizes congestion control is achieved through 
consideration of an incremental cost function that addresses link congestion, and 
an incremental cost function that addresses the cost of providing less than the 
desired transmission rate. A coarse version of the algorithm is geared towards 
implementation in the current Internet, relying on the end-to-end packet loss 
observations as indication of congestion. A more complete version anticipates an 
Internet where sessions can solicit explicit congestion information through a 
concise probing mechanism. 
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As the below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name. 

I believe I am the original, first and sole inventor of the subject matter which is claimed 
and for which a patent is sought on the invention entitled End-to-End Internet Congestion 
Control the specification of which is attached hereto 

I hereby state that I have reviewed and understand the contents of the above identified 
specification, including the claims, as amended by an amendment, if any, specifically referred to 
in this oath or declaration. 

I acknowledge the duty to disclose all information known to me which is material to 
patentability as defined in Title 37, Code of Federal Regulations, 1.56. 

I hereby claim foreign priority benefits under Title 35, United States Code, 119 of any 
foreign application(s) for patent or inventor's certificate listed below and have also identified 
below any foreign application for patent or inventor's certificate having a filing date before that 
of the application on which priority is claimed: 

None 

I hereby claim the benefit under Title 35, United States Code, 120 of any United States 
application(s) listed below and, insofar as the subject matter of each of the claims of this 
application is not disclosed in the prior United States application in the manner provided by the 
first paragraph of Title 35, United States Code, 112, I acknowledge the duty to disclose all 
information known to me to be material to patentability as defined in Title 37, Code of Federal 
Regulations, 1.56 which became available between the fifing date of the prior application and the 
national or PCT international filing date of this application: 

None 

I hereby declare that all statements made herein of my own knowledge are true and that 
all statements made on information and belief are believed to be true; and further that these 
statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States 
Code and that such willful false statements may jeopardize the validity of the application or any 
patent issued thereon. 

I hereby appoint the following attomey(s) with full power of substitution and revocation, 
to prosecute said application, to make alterations and amendments therein, to receive the patent, 
and to transact all business in the Patent and Trademark Office connected therewith: 
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